A Compile-Time Data Locality Optimization Framework for NUCA Chip Multiprocessors

نویسندگان

Qingda Lu

Uday Bondhugula

Sriram Krishnamoorthy

P. Sadayappan

Yongjian Chen

Haibo Lin

Tin-Fook Ngai

چکیده

With increasing numbers of cores, future CMPs (Chip MultiProcessors) are likely to have a tiled architecture with a portion of shared L2 cache on each tile and a bank-interleaved distribution of the address space. For data-parallel programming models, there is a mismatch between such a non-uniform cache organization and the canonical row-major or column-major layouts of multi-dimensional arrays – causing a significant number of non-local L2 accesses for many commonly occurring data access patterns. In this paper we develop a compile-time framework for data locality optimization via data layout transformation. Using a polyhedral model for dependences, the program’s localizability is determined by analysis of intraand inter-statement dependences, followed by non-canonical data layout transformation to reduce non-local accesses for localizable computations. Simulation-based results on a 16-core 2D tiled CMP demonstrate the effectiveness of the approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Judicious Thread Migration When Accessing Distributed Shared Caches

Chip-multiprocessors (CMPs) have become the mainstream chip design in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Architecture (NUCA) design, where onchip access latencies depend on the physical distances between requesting cores and home cores where the data i...

متن کامل

Adaptive Zone-Aware Multi-bank on Chip last level L2 Cache Partitioning for Chip Multiprocessors

This paper proposes a novel efficient Non-Uniform Cache Architecture (NUCA) scheme for the Last-Level Cache (LLC) to reduce the average on-chip access latency and improve core isolation in Chip Multiprocessors (CMP). The architecture proposed is expected to improve upon the various NUCA schemes proposed so far such as S-NUCA, D-NUCA and SP-NUCA[9][10][5] in terms of average access latency witho...

متن کامل

3D Tree Cache – A Novel Approach to Non- Uniform Access Latency Cache Architectures for 3D CMPs

We consider a non-uniform access latency cache architecture (NUCA) design for 3D chip multiprocessors (CMPs) where cache structures are divided into small banks interconnected by a network-on-chip (NoC). In earlier NUCA designs, data is placed in banks either statically (S-NUCA) or dynamically (D-NUCA). In both SNUCA and D-NUCA designs, scaling to hundreds of cores can pose several challenges. ...

متن کامل

Analysis of Non-Uniform Cache Architecture Policies for Chip-Multiprocessors Using the Parsec Benchmark Suite

Non-Uniform Cache Architectures (NUCA) have been proposed as a solution to overcome wire delays that will dominate on-chip latencies in Chip Multiprocessor designs in the near future. This novel means of organization divides the total memory area into a set of banks that provides nonuniform access latencies and thus faster access to those banks that are close to the processor. A NUCA model can ...

متن کامل

Performance Analysis of Non-Uniform Cache Architecture Policies for Chip-Multiprocessors Using the Parsec v2.0 Benchmark Suite

Non-Uniform Cache Architectures (NUCA) have been proposed as a solution to overcome wire delays that will dominate on-chip latencies in Chip Multiprocessor designs in the near future. This novel means of organization divides the total memory area into a set of banks that provides non-uniform access latencies and thus faster access to those banks that are close to the processor. A NUCA model can...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

A Compile-Time Data Locality Optimization Framework for NUCA Chip Multiprocessors

نویسندگان

چکیده

منابع مشابه

Judicious Thread Migration When Accessing Distributed Shared Caches

Adaptive Zone-Aware Multi-bank on Chip last level L2 Cache Partitioning for Chip Multiprocessors

3D Tree Cache – A Novel Approach to Non- Uniform Access Latency Cache Architectures for 3D CMPs

Analysis of Non-Uniform Cache Architecture Policies for Chip-Multiprocessors Using the Parsec Benchmark Suite

Performance Analysis of Non-Uniform Cache Architecture Policies for Chip-Multiprocessors Using the Parsec v2.0 Benchmark Suite

عنوان ژورنال:

اشتراک گذاری